AutoTagTCG : A Framework for Automatic Thai CG Tagging

نویسندگان

  • Thepchai Supnithi
  • Taneth Ruangrajitpakorn
  • Kanokorn Trakultaweekool
  • Peerachet Porkaew
چکیده

Recently, categorical grammar has been focused as a powerful grammar. This paper aims to develop a framework for automatic CG tagging for Thai. We investigated two main algorithms, CRF and Statistical alignment model based on information theory (SAM). We found that SAM gives the best results both in word level and sentence level. We got the accuracy 89.25% in word level and 82.49% in sentence level. SAM is better than CRF in known word. On the other hand, CRF is better than SAM when we applied for unknown word. Combining both methods can be suited for both known and unknown word.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-stage Annotation using Pattern-based and Statistical-based Techniques for Automatic Thai Annotated Corpus Construction

An automated or semi-automated annotation is a practical solution towards largescale corpus construction. However, special characteristics of Thai language, such as lack of word-boundary and sentenceboundary markers trigger several issues in automatic corpus annotation. This paper presents a multi-stage annotation framework, containing two stages of chunking and three stages of tagging. Two chu...

متن کامل

Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees

A method for deriving an approximately labeled dependency treebank from the Thai Categorial Grammar Treebank has been implemented. The method involves a lexical dictionary for assigning dependency directions to the CG types associated with the grammatical entities in the CG bank, falling back on a generic mapping of CG types in case of unknown words. Currently, all but a handful of the trees in...

متن کامل

Basic Principles for Segmenting Thai EDUs

This paper proposes a guideline to determine Thai elementary discourse units (EDUs) based on rhetorical structure theory. Carson and Marcu’s (2001) guideline for segmenting English EDUs is modified to propose a suitable guideline for segmenting EDUs in Thai. The proposed principles are used in tagging EDUs for constructing a corpus of discourse tree structures. It can also be used as the basis ...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

The Automatic Thai Sentence Extraction

Unlike English, there is no explicit sentence marker in the Thai language. Conventionally, space is placed at the end of sentence in Thai writing. But it does not mean that space always indicates the sentence boundary. It is also used as other purposes [Danvivathana 1987]. This paper presents an algorithm to extract sentences from paragraph by detecting the true sentence breaking spaces, by app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010